Proteome-wide analysis of protein function composition reveals the clustering and phylogenetic properties of organisms.
نویسندگان
چکیده
A 17-dimensional vector named the proteome vector is defined to represent an organism. The components of the vector reflect the relative contents of protein-encoding genes of the 17 cluster of orthologous groups of proteins (COGs) classes in the whole genome of the relevant organism. Based on the definition of this proteome vector, the fuzzy clustering of 36 completely sequenced organisms (8 archaea, 24 bacteria, and 4 eukarya) was performed and a proteome tree was constructed. Our results show that (1) the 36 organisms can be 100% correctly classified into three clusters corresponding to the three primary kingdoms, (2) our proteome tree is remarkably similar to that derived from 16S rRNA, and (3) the chromosomes and/or plasmids belonging to the same organism have very similar gene composition. Based on these results, we argue that the 17-dimensional proteome vector could be a good criterion for clustering approaches and to a large extent reveals the phylogenetic properties of organisms; the Three Primary Kingdoms Hypothesis is trustworthy although the existence of lateral gene transfer (LGT) brings controversy to the construction of the "universal tree of life."
منابع مشابه
Distinct Protein Classes in Human Red Cell Proteome Revealed by Similarity of Phylogenetic Profiles
The minimal set of proteins necessary to maintain a vertebrate cell forms an interesting core of cellular machinery. The known proteome of human red blood cell consists of about 1400 proteins. We treated this protein complement of one of the simplest human cells as a model and asked the questions on its function and origins. The proteome was mapped onto phylogenetic profiles, i.e. vectors of sp...
متن کاملPhylogenetic analysis of HSP70 gene of Aspergillus fumigatus reveals conservation intra-species and divergence inter-species
Aspergillus fumigatus is a saprophyte fungus, widely spread in a variety of ecologicalniches and the most prevalent aspergilli responsible for human and animal invasiveaspergillosis. The first step to develop novel and efficient therapies is the identificationand understanding of the key tolerance and virulence factors of pathogens. The mainfocus of the present study is to perform the similarit...
متن کاملGenome cluster database. A sequence family analysis platform for Arabidopsis and rice.
The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified f...
متن کاملPhylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467
Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...
متن کاملBioinformatics Genome Cluster Database . A Sequence Family Analysis Platform for Arabidopsis and Rice 1
The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular phylogenetics and evolution
دوره 25 1 شماره
صفحات -
تاریخ انتشار 2002